Improved Performance Measures for Voice Activity Detection

نویسندگان

  • Simon Graf
  • Tobias Herbig
  • Markus Buck
  • Gerhard Schmidt
چکیده

Voice activity detection is an essential part of many speech processing algorithms. The requirements of the speech application determine the design of voice activity detection. Some applications need low-latency results whereas the accuracy of speech detection is more important for other applications. The performance is generally evaluated by Receiver Operating Characteristic (ROC) curves, which perform a static analysis averaged over speech and nonspeech segments, respectively. We adopt the ROC curves but evaluate them for specific speech classes, e.g., voiced or unvoiced speech, to describe the overall accuracy of speech detection. In addition, we present a new measure for the dynamic behavior that considers the delay and latency of speech onand offset detection. Finally, we present a unified measure for both aspects. This measure may be used to find appropriate voice activity detection features for a given application. An automotive noise scenario is employed to demonstrate the measures as it contains stationary and non-stationary noise.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

Power Spectral Deviation-Based Voice Activity Detection Incorporating Teager Energy for Speech Enhancement

In this paper, we propose a robust voice activity detection (VAD) algorithm to effectively distinguish speech from non-speech in various noisy environments. The proposed VAD utilizes power spectral deviation (PSD), using Teager energy (TE) to provide a better representation of the PSD, resulting in improved decision performance for speech segments. In addition, the TE-based likelihood ratio and...

متن کامل

Voice Activity Detection Using Global Speech Absence Probability Based on Teager Energy for Speech Enhancement

In this paper, we propose a novel voice activity detection (VAD) algorithm using global speech absence probability (GSAP) based on Teager energy (TE) for speech enhancement. The proposed method provides a better representation of GSAP, resulting in improved decision performance for speech and noise segments by the use of a TE operator which is employed to suppress the influence of a noise signa...

متن کامل

Improved voice activity detection based on a smoothed statistical likelihood ratio

This paper presents the behavioural mechanism of a statistical modelbased voice activity detector (VAD), featuring a likelihood ratio test for the activity decision. From investigation of the VAD, it is found that detection errors could occur frequently at speech offset regions because of the delay term in the decision-directed parameter estimator, employed for the estimation of an unknown para...

متن کامل

Towards improving statistical model based voice activity detection

Statistical model based voice activity detection (VAD) is commonly used in various speech related research and applications. In this paper, we try to improve the performance of statistical model based VAD via new feature extraction method. Our main innovation focuses on that we apply Mel-frequency subband coefficients with power-law nonlinearity as feature for statistical model based VAD instea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014